欢迎关注“小丫画图”公众号,同名知识星球等你加入

小丫微信: epigenomics E-mail:

作者:赵龙,中科院遗传所在读博士

擅长:ChIP-seq,MNase-seq,ATAC-seq,HiC,ChIA-PET,GWAS分析,R语言。

兴趣:单细胞RNA-seq,ATAC-seq,机器学习相关。

小丫编辑校验

需求描述

从形式上复现原图。

有时没有现成的包能画出一模一样的图,我们可以通过计算图中每个元素的位置,然后画出想要的效果。

出自https://www.nature.com/articles/s41467-019-09222-w

Fig. 2 Characteristics of glycosites identified with AI-ETD. f A glycoprotein-glycan network maps which glycans (outer circle, 117 total) modify which proteins (inner bar, 771 total). Glycoproteins are sorted by number of glycosites (scale to the right). Glycans are organized by classification, and edges are colored by the glycan node from which they originate, except for mannose-6-phosphate which has yellow edges. See Supplementary Fig. 11 and Supplementary Table 1 for glycan identifiers.

A glycoprotein-glycan network diagram in Fig. 2f maps which glycans (outer nodes) were observed on identified glycoproteins (inner column, organized by number of glycosites). Several discernable patterns appear, perhaps most notably the prevalence of high mannose glycosylation. The network diagram also indicates that the majority of fucosylated, paucimannose, and sialylated glycans occur on proteins with multiple glycosylation sites, and it indicates which glycans contribute more to heterogeneity. Supplementary Figure 11 provides a larger version of this network diagram with glycan identities in Supplementary Table 1.

图的解析

这个图展示两层信息:糖蛋白上glycan的种类,糖蛋白上glycosites的数量(数量可替换成其他分类信息,例如上调/下调,或所在的通路等等)。

原文方法描述是用igraph画的:The protein-glycan network, glycan co-occurrence networks, and glycosylation profiles for subcellular groups were created in R 3.2.2 using the igraph library70, and the arcplot were created with arcdiagram library。

这里用ggplot2画图,先计算位置再画图。

应用场景

展示分类、分组跟元素之间的关系。例如:glycan-数量-glycoprotein,GO-TF-targe gene、表观修饰-上下调-gene、enhancer-motif-promoter等。

以TF-GO-target gene互作为例,转录组找出几个GO 途径的基因可以放在外围的点,颜色代表不同GO term,点的大小代表基因表达水平或fold change。里面的矩形是我们共公共数据库找到的能够结合这些外围基因的转录因子。结果哪些转录因子激活基因,哪些抑制,就一目了然了。如果有HiC或ChIA-pet数据,hub 位点和靶基因的互作也是同理。

环境设置

使用国内镜像安装包

options("repos"= c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")

加载包

library(ggplot2) #用于画图
library(dplyr) #用于数据处理
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Sys.setenv(LANGUAGE = "en") #显示英文报错信息
options(stringsAsFactors = FALSE) #禁止chr转成factor

输入文件

easy_input_node.txt和easy_input_group.txt。从例文41467_2019_9222_MOESM5_ESM.xlsx文件整理而来。

Glycan <- read.table("easy_input_node.txt", sep = "\t", head=T)
head(Glycan)
##   Node                Glycan      Gly.clu
## 1    1             HexNAc(1) Paucimannose
## 2    2             HexNAc(2) Paucimannose
## 3    3       HexNAc(2)Fuc(1) Paucimannose
## 4    4       HexNAc(2)Hex(1) Paucimannose
## 5    5 HexNAc(2)Hex(1)Fuc(1) Paucimannose
## 6    6       HexNAc(2)Hex(2) Paucimannose
dim(Glycan)
## [1] 117   3
protein <- read.table("easy_input_group.txt",head=T)
head(protein)
##   Protein Count
## 1  Q9Z2W9     7
## 2  Q9Z2W8     5
## 3  Q9Z2L6     6
## 4  Q9Z2A9     1
## 5  Q9Z247     4
## 6  Q9Z218     2
##                                                                                                                    Glycan
## 1 HexNAc(6)Hex(3);HexNAc(2)Hex(8);HexNAc(2)Hex(7);HexNAc(2)Hex(6);HexNAc(2)Hex(5);HexNAc(6)Hex(4);HexNAc(4)Hex(4)NeuAc(1)
## 2                                         HexNAc(2)Hex(8);HexNAc(6)Hex(3);HexNAc(2)Hex(6);HexNAc(2)Hex(9);HexNAc(2)Hex(5)
## 3                   HexNAc(2)Hex(6);HexNAc(2)Hex(5);HexNAc(4)Hex(3)Fuc(1);HexNAc(2)Hex(7);HexNAc(6)Hex(3);HexNAc(2)Hex(8)
## 4                                                                                                         HexNAc(2)Hex(5)
## 5                                                         HexNAc(2)Hex(8);HexNAc(2)Hex(7);HexNAc(2)Hex(6);HexNAc(6)Hex(3)
## 6                                                                                         HexNAc(3)Hex(6);HexNAc(2)Hex(5)
dim(protein)
## [1] 771   3

计算图中各元素位置

计算点(Glycan)的坐标

一共有117个点,右侧(第一、四象限)59个,左侧(二、三象限)58个。

先计算每个点与x轴的角度,再通过与x轴的角度和三角函数计算出每个点的横纵坐标。

计算角度时,因为不是整个圆,上下各缺了一块,缺的角度为30度。所以第一个蛋白角度是75度,第59个蛋白是-75度,第60个蛋白是-107度,最后一个是-255度。

计算坐标时,考虑到后面还要画中央矩形,因此以蛋白数量/2作为半径。

Glycan$angle <- ifelse(Glycan$Node < 60,
                       75-(150/58*(Glycan$Node-1)),
                       75-(150/58*(Glycan$Node-1))-30)

Glycan$G.x <- cos(Glycan$angle*pi/180.0)*(nrow(protein)/2)
Glycan$G.y <- sin(Glycan$angle*pi/180.0)*(nrow(protein)/2)
head(Glycan)
##   Node                Glycan      Gly.clu    angle       G.x      G.y
## 1    1             HexNAc(1) Paucimannose 75.00000  99.77474 372.3644
## 2    2             HexNAc(2) Paucimannose 72.41379 116.47513 367.4831
## 3    3       HexNAc(2)Fuc(1) Paucimannose 69.82759 132.93825 361.8531
## 4    4       HexNAc(2)Hex(1) Paucimannose 67.24138 149.13056 355.4860
## 5    5 HexNAc(2)Hex(1)Fuc(1) Paucimannose 64.65517 165.01909 348.3948
## 6    6       HexNAc(2)Hex(2) Paucimannose 62.06897 180.57145 340.5939

计算最外围的弧线

外围的弧线内侧是圆半径的1.1倍,外侧是1.15倍

# 取Gly.clu列
curve <- Glycan[,3, drop=F]

curve$x.start <- Glycan$G.x*1.1
curve$x.end <- Glycan$G.x*1.15
curve$y.start <- Glycan$G.y*1.1
curve$y.end <- Glycan$G.y*1.15

# 给弧线之间留个空隙,根据自己的数据调整数量吧
curve <- curve[-c(9:10, #9是第一类Paucimannose的数量,以此类推
                  45:46,73:74,107:108),]
head(curve)
##        Gly.clu  x.start    x.end  y.start    y.end
## 1 Paucimannose 109.7522 114.7410 409.6008 428.2191
## 2 Paucimannose 128.1226 133.9464 404.2314 422.6055
## 3 Paucimannose 146.2321 152.8790 398.0384 416.1311
## 4 Paucimannose 164.0436 171.5001 391.0346 408.8089
## 5 Paucimannose 181.5210 189.7720 383.2343 400.6540
## 6 Paucimannose 198.6286 207.6572 374.6533 391.6830

中央矩形

根据glycosites数量分成5类,其中4和5一组,>5一组,与右侧图例对应。

protein.pos <- protein[,1:2]
protein.pos <- arrange(protein.pos, Count)

protein.pos$P.x <- 0
protein.pos$P.y <- seq(floor(nrow(protein)/2),-floor(nrow(protein)/2))

protein.pos$cluster <- ifelse(protein.pos$Count < 4, protein.pos$Count, ifelse(protein.pos$Count < 6, "4", "5"))

head(protein.pos)
##     Protein Count P.x P.y cluster
## 1    Q9Z2A9     1   0 385       1
## 2    Q9WVT6     1   0 384       1
## 3    Q9WVB4     1   0 383       1
## 4    Q9WV91     1   0 382       1
## 5    Q9WUH7     1   0 381       1
## 6 Q9R1V6-17     1   0 380       1

连接线,即Glycan-protein

connect <- protein[,c(1,3)]
Gly.name <- strsplit(as.character(connect[,2]),split=";")
connect.count <- sapply(Gly.name,length)
connection <- data.frame(Protein=rep(connect$Protein,connect.count),Glycan=unlist(Gly.name))

merge

seg.clu主要是为了画最后一个Glycan那一点点黄色

all <- merge(connection,Glycan,by="Glycan")
all1 <- merge(all,protein.pos,by="Protein")
all1$seg.clu <- ifelse(all1$Glycan=="HexNAc(2)Hex(6)Phospho(1)","zz",all1$Gly.clu)
head(all1)
##      Protein                Glycan Node        Gly.clu      angle
## 1 A0A087WPX3             HexNAc(1)    1   Paucimannose   75.00000
## 2 A0A087WPX3       HexNAc(6)Hex(4)   60 Complex/Hybrid -107.58621
## 3 A0A087WPX3       HexNAc(6)Hex(3)   59 Complex/Hybrid  -75.00000
## 4 A0A087WPX3 HexNAc(3)Hex(3)Fuc(1)   77    Fucosylated -151.55172
## 5 A0A087WPX3             HexNAc(2)    2   Paucimannose   72.41379
## 6 A0A087WPX3       HexNAc(2)Hex(9)  111   High Mannose -239.48276
##          G.x       G.y Count P.x  P.y cluster        seg.clu
## 1   99.77474  372.3644    27   0 -378       5   Paucimannose
## 2 -116.47513 -367.4831    27   0 -378       5 Complex/Hybrid
## 3   99.77474 -372.3644    27   0 -378       5 Complex/Hybrid
## 4 -338.94992 -183.6388    27   0 -378       5    Fucosylated
## 5  116.47513  367.4831    27   0 -378       5   Paucimannose
## 6 -195.75598  332.0991    27   0 -378       5   High Mannose

开始画图

# 自定义颜色
cols <- c("royalblue4","gray80","seagreen3","powderblue","steelblue","goldenrod1")
fills <- c("#FFFFD4","#FEE391","#FEC44F","#FE9929","#D95F0E")
# 查看颜色
library(scales)
show_col(cols)

show_col(fills)

# 画点
p <- ggplot(all1) +
  geom_point(aes(G.x,G.y,color=Gly.clu))
p

# 画连线
p1 <- p + 
  geom_segment(aes(x=G.x,y=G.y,xend=P.x,yend=P.y,color=seg.clu),alpha=0.1)
p1

# 画矩形
p2 <- p1 + 
  geom_rect(aes(xmin=-15,xmax=15,ymin=P.y,ymax=P.y+1,fill=cluster)) +
  geom_rect(aes(xmin=500,xmax=530,ymin=P.y,ymax=P.y+1,fill=cluster))
p2 

# 画弧线
p3 <- p2 + 
  geom_segment(data=curve,aes(x=x.start,xend=x.end,y=y.start,yend=y.end,color=Gly.clu),size=5)
p3

# 清空背景和坐标轴
p4 <- p3 +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank()
    )  
p4

# 用自定义颜色填充和画线
p5 <- p4 +
  scale_color_manual(values=cols) +
  scale_fill_manual(values=fills)
p5

# 输出到文件
pdf(file="target.pdf", height=5, width=5.5)
 p5
dev.off()
## quartz_off_screen 
##                 2

后期处理

输出的PDF文件是矢量图,可以用PS或AI等矢量图编辑器打开,编辑图形和文字。

其实本来文字和数字都是可以在R中加的,但是还需要计算位置,莫不如在PS里修改。这里有一个PS的中间文件可以参考target.psdtarget.tif

在用到自己数据的时候,最好先把原数据跑一遍,理解每个数字的含义,和整体的思路,再去举一反三。这里涉及到一些数学的三角函数计算,和角度的计算,还是要理解每个数字的含义的。

有任何问题欢迎在小丫画图的知识星球中提问。

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.15
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_1.0.0  dplyr_0.8.3   ggplot2_3.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2       knitr_1.24       magrittr_1.5     tidyselect_0.2.5
##  [5] munsell_0.5.0    colorspace_1.4-1 R6_2.4.0         rlang_0.4.0     
##  [9] stringr_1.4.0    tools_3.6.0      grid_3.6.0       gtable_0.3.0    
## [13] xfun_0.9         withr_2.1.2      htmltools_0.3.6  yaml_2.2.0      
## [17] lazyeval_0.2.2   digest_0.6.20    assertthat_0.2.1 tibble_2.1.3    
## [21] crayon_1.3.4     purrr_0.3.2      glue_1.3.1       evaluate_0.14   
## [25] rmarkdown_1.15   labeling_0.3     stringi_1.4.3    compiler_3.6.0  
## [29] pillar_1.4.2     pkgconfig_2.0.2